[WIP] LM Workload #860

rka97 · 2025-04-03T17:44:18Z

This is for the LM workload.

Dev -> main

…ency into lm_workload

… pytorch calls detatch

…modules

algoperf/workloads/finewebedu_lm/input_pipeline.py

…prove clarity

rka97 · 2025-10-21T05:10:46Z

Adding some TODOs:

Add model matching tests (i.e. for the same inputs, outputs should be the same, kinda like model_match)
Fix initialization to be the same for both PyTorch and JAX (there are some minor differences it seems)
Add integration test for the lm workload to github actions

…JAX and PyTorch, also unify initialization to be the same in both

…uding learned scaling factor

priyakasimbeg

Second round of small requested changes.

Perhaps something we should discuss, we need a more descriptive name for the workload. E.g. fineweb_edu_lm. What do you all think? @Niccolo-Ajroldi @rka97

priyakasimbeg · 2025-10-24T07:41:52Z

algoperf/workloads/finewebedu_lm/finewebedu_lm_jax/models.py

@@ -0,0 +1,397 @@
+"""
+Originally based on code from the NanoDO repository under the Apache 2.0 license:


Can we rename this file to models.py to be consistent with the pattern in the other workload definitions.

priyakasimbeg · 2025-10-24T07:42:15Z

algoperf/workloads/finewebedu_lm/finewebedu_lm_pytorch/models.py

@@ -0,0 +1,344 @@
+"""
+Originally based on the plainLM codebase:


Can we rename this file to models.py to be consistent with the other workload definitions

priyakasimbeg · 2025-10-24T07:43:46Z

algoperf/workloads/workloads.py

    'workload_path': 'librispeech_deepspeech/librispeech',
    'workload_class_name': 'LibriSpeechDeepSpeechNormAndSpecAugWorkload',
  },
+  'lm': {'workload_path': 'lm/lm', 'workload_class_name': 'LmWorkload'},


Now that we have all the important implementation details figured out should we pick a more descriptive name for the workload? I am thinking perhaps 'fineweb_edu_lm'?

fineweb_edu_lm or finewebedu_lm make sense, matching the other workload names.

Changed to finewebedu_lm.

priyakasimbeg · 2025-10-24T07:45:39Z

algorithms/archived_paper_baselines/adamw/pytorch/submission.py

  elif workload_name == 'mnist':
    return 16
+  elif workload_name == 'lm':
+    return 4


This should work for bsz 64 right?

Yes, I just made it smaller because I was debugging with v100s. Changed it back to 64.

priyakasimbeg · 2025-10-24T07:46:14Z

algorithms/archived_paper_baselines/nesterov/jax/submission.py

  elif workload_name == 'cifar':
    return 128
+  elif workload_name == 'lm':
+    return 8


Should work for bsz 64 right?

Yes, I just made it smaller because I was debugging with v100s. Changed it back to 64.

…ency into lm_workload

priyakasimbeg and others added 30 commits February 27, 2025 14:56

Merge pull request #847 from mlcommons/dev

1d81455

Dev -> main

first LM commit

da5f85a

lm data pipeline

a12a364

testing

ca83ab8

LM workload tested torch pipeline

e3e78dc

LM workload - fix torch tests

e619495

add LM tests, remove dev files

d8e9c56

add LM tests, remove dev files

6b4ff12

Stop tracking .gitignore

3c5c847

Remove dev/ from repo, keep locally

20d841b

fix comments

f3ba059

add class specifications

381451f

add workload LM info

f111d2e

restore data_utils.py tree map

808d398

fixed NFS bug

35f8f89

train/val split before concat

cbb6ee6

renamed datasets to avoid conflict with HF

868987c

Merge remote-tracking branch 'upstream/lm_workload' into lm_workload

8191f6d

renamed datasets to dataset

dd59ded

fix style

496b9c3

fix formatting

50989eb

fix style

5af0fdc

fix style

2683099

fix yapf

6b7ee29

fix style

46b645b

HF datasets pipeline

b3ae647

Testing with linear model

f095d4b

Merge branch 'jit_switch' into lm_workload

4189ae0

lm workload with linear model

0c22f3d

add nanodo model

99c7b9b

priyakasimbeg and others added 9 commits October 20, 2025 17:26

label smoothing default fix

42d1d1a

finish merge

c334c97

Make sure to take the correct number of batches in lm

d95f2bf

Merge branch 'lm_workload' of github.com:mlcommons/algorithmic-effici…

7deb070

…ency into lm_workload

Properly handle repetition in LM training and evaluation splits

0dc16db

move eval_batch from shared class to framework specific classes since…

7edb702

… pytorch calls detatch

finish merge

0879e68

Refactor imports and clean up unused code in LM workload and related …

73e3ea6

…modules

pass linter checks

91988af

priyakasimbeg requested changes Oct 21, 2025

View reviewed changes

algoperf/workloads/finewebedu_lm/input_pipeline.py Show resolved Hide resolved

algoperf/workloads/finewebedu_lm/input_pipeline.py Show resolved Hide resolved

Refactor loss function in LM workloads to unify label handling and im…

bb4a380

…prove clarity

rka97 force-pushed the lm_workload branch from f6a705d to bb4a380 Compare October 21, 2025 05:06

rka97 added 2 commits October 21, 2025 08:46

Fix init in both models to be the same, add lm model diff test

a58fbd5

Refactor model configuration classes to make them consistent between …

b59afa0

…JAX and PyTorch, also unify initialization to be the same in both

rka97 force-pushed the lm_workload branch from 2251c3e to b59afa0 Compare October 21, 2025 09:07

Add query-key normalization to CausalAttn and Attention classes, incl…

d35cdde

…uding learned scaling factor

rka97 force-pushed the lm_workload branch from bbde48b to d35cdde Compare October 23, 2025 17:11

priyakasimbeg requested changes Oct 24, 2025

View reviewed changes

priyakasimbeg and others added 11 commits October 24, 2025 19:46

update target

ffb8163

Merge branch 'lm_workload' of github.com:mlcommons/algorithmic-effici…

2cc9dff

…ency into lm_workload

add pytorch nadamw_target_setting

202e5cb

docker updates for a100

98e491a

update budgets for a100 hardware weightclass

f0f7774

formatting

b93eb3c

revert changes to docker build shell script

88b0e47

fix merge conflict

fa946d8

merge

2442519

rename models.py

02f835d

rename workload

0abf39d

		@@ -0,0 +1,397 @@
		"""
		Originally based on code from the NanoDO repository under the Apache 2.0 license:

		@@ -0,0 +1,344 @@
		"""
		Originally based on the plainLM codebase:

[WIP] LM Workload #860

Are you sure you want to change the base?

[WIP] LM Workload #860

Uh oh!

Conversation

rka97 commented Apr 3, 2025

Uh oh!

Uh oh!

Uh oh!

rka97 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

priyakasimbeg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rka97 commented Oct 21, 2025 •

edited

Loading